88 research outputs found

    Algorithms for viral haplotype reconstruction and bacterial metagenomics: resolving fine-scale variation in next generation sequencing data

    Get PDF
    The discovery of DNA has been one of the biggest catalysts in genomic research. Sequencing has enabled us to access the wealth of information encoded in DNA and has provided the basis for ground-breaking achievements such as the first complete human genome sequence. Furthermore, it has tremendously advanced our understanding of life-threatening genetic disorders and bacterial and viral infections. With the recent advent of next generation sequencing (NGS) technologies, sequencing became accessible to the majority of researchers and made metagenomic sequencing widely available. However, to realise its true potential, sophisticated and tailor-made bioinformatic programs are essential to translate the collected data into meaningful information. My thesis explored the potential of resolving fine-scale variation in NGS data. The identification and correction of artificial fine-scale variation in the form of biases and errors is imperative in order to draw valid conclusions. Furthermore, resolving natural fine-scale variation in the form of single nucleotide polymorphisms (SNPs) and closely related species or strains is critical for the development of effective treatments and the characterisation of diseases. In recent years, Illumina has emerged as the global market leader in DNA sequencing. However, biases and errors associated with this high-throughput sequencing technology are still poorly understood which has precluded the development of effective noise removal algorithms. In addition, many programs were not designed for Illumina data or metagenomic sequencing. Therefore, a better understanding of the idiosyncrasies encountered in Illumina data is essential and programs must be tested and benchmarked on realistic and reliable in silico data sets to reveal not only their true capacities but also their limitations. I conducted the largest in vivo study of Illumina error profiles in combination with state-of-the-art library preparation methods to date. For the first time, a direct connection between experimental design factors and systematic errors was established, providing detailed insight into the nature of Illumina errors. Further, I tested various error removal techniques and developed a sophisticated Illumina amplicon noise removal algorithm, enabling researchers to choose optimal processing strategies for their particular data sets. In addition, I devised several simulation tools that accurately reflect artificial and natural fine-scale variation. This includes a flexible and efficient read simulation program which is the only program that can directly reflect the impact of experimental design factors. Furthermore, I developed a program simulating the evolution of a virus into a quasi-species. These programs formed the basis for two comprehensive benchmarking studies that revealed the capacities and limitations of viral haplotype reconstruction programs and taxonomic classification programs, respectively. My work furthers our knowledge of Illumina sequencing errors and will facilitate more accurate and effective analyses of sequencing data sets

    Metagenomic sequencing unravels gene fragments with phylogenetic signatures of O2-tolerant NiFe membrane-bound hydrogenases in lacustrine sediment

    Get PDF
    Many promising hydrogen technologies utilising hydrogenase enzymes have been slowed by the fact that most hydrogenases are extremely sensitive to O2. Within the group 1 membrane-bound NiFe hydrogenase, naturally occurring tolerant enzymes do exist, and O2 tolerance has been largely attributed to changes in iron–sulphur clusters coordinated by different numbers of cysteine residues in the enzyme’s small subunit. Indeed, previous work has provided a robust phylogenetic signature of O2 tolerance [1], which when combined with new sequencing technologies makes bio prospecting in nature a far more viable endeavour. However, making sense of such a vast diversity is still challenging and could be simplified if known species with O2-tolerant enzymes were annotated with information on metabolism and natural environments. Here, we utilised a bioinformatics approach to compare O2-tolerant and sensitive membrane-bound NiFe hydrogenases from 177 bacterial species with fully sequenced genomes for differences in their taxonomy, O2 requirements, and natural environment. Following this, we interrogated a metagenome from lacustrine surface sediment for novel hydrogenases via high-throughput shotgun DNA sequencing using the Illumina™ MiSeq platform. We found 44 new NiFe group 1 membrane-bound hydrogenase sequence fragments, five of which segregated with the tolerant group on the phylogenetic tree of the enzyme’s small subunit, and four with the large subunit, indicating de novo O2-tolerant protein sequences that could help engineer more efficient hydrogenases

    Illumina error profiles : resolving fine-scale variation in metagenomic sequencing data

    Get PDF
    Background: Illumina’s sequencing platforms are currently the most utilised sequencing systems worldwide. The technology has rapidly evolved over recent years and provides high throughput at low costs with increasing read-lengths and true paired-end reads. However, data from any sequencing technology contains noise and our understanding of the peculiarities and sequencing errors encountered in Illumina data has lagged behind this rapid development. Results: We conducted a systematic investigation of errors and biases in Illumina data based on the largest collection of in vitro metagenomic data sets to date. We evaluated the Genome Analyzer II, HiSeq and MiSeq and tested state-of-the-art low input library preparation methods. Analysing in vitro metagenomic sequencing data allowed us to determine biases directly associated with the actual sequencing process. The position- and nucleotide-specific analysis revealed a substantial bias related to motifs (3mers preceding errors) ending in “GG”. On average the top three motifs were linked to 16 % of all substitution errors. Furthermore, a preferential incorporation of ddGTPs was recorded. We hypothesise that all of these biases are related to the engineered polymerase and ddNTPs which are intrinsic to any sequencing-by-synthesis method. We show that quality-score-based error removal strategies can on average remove 69 % of the substitution errors - however, the motif-bias remains. Conclusion: Single-nucleotide polymorphism changes in bacterial genomes can cause significant changes in phenotype, including antibiotic resistance and virulence, detecting them within metagenomes is therefore vital. Current error removal techniques are not designed to target the peculiarities encountered in Illumina sequencing data and other sequencing-by-synthesis methods, causing biases to persist and potentially affect any conclusions drawn from the data. In order to develop effective diagnostic and therapeutic approaches we need to be able to identify systematic sequencing errors and distinguish these errors from true genetic variation

    A Comprehensive Benchmarking Study of Protocols and Sequencing Platforms for 16s Rrna Community Profiling

    Get PDF
    In the last 5 years, the rapid pace of innovations and improvements in sequencing technologies has completely changed the landscape of metagenomic and metagenetic experiments. Therefore, it is critical to benchmark the various methodologies for interrogating the composition of microbial communities, so that we can assess their strengths and limitations. The most common phylogenetic marker for microbial community diversity studies is the 16S ribosomal RNA gene and in the last 10 years the field has moved from sequencing a small number of amplicons and samples to more complex studies where thousands of samples and multiple different gene regions are interrogated. Results: Weassembled2syntheticcommunitieswithaneven(EM)anduneven(UM)distributionofarchaealand bacterial strains and species, as metagenomic control material, to assess performance of different experimental strategies. The 2 synthetic communities were used in this study, to highlight the limitations and the advantages of the leading sequencing platforms: MiSeq (Illumina), The Pacific Biosciences RSII, 454 GS-FLX/+ (Roche), and IonTorrent (Life Technologies). We describe an extensive survey based on synthetic communities using 3 experimental designs (fusion primers, universal tailed tag, ligated adaptors) across the 9 hypervariable 16S rDNA regions. We demonstrate that library preparation methodology can affect data interpretation due to different error and chimera rates generated during the procedure. The observed community composition was always biased, to a degree that depended on the platform, sequenced region and primer choice. However, crucially, our analysis suggests that 16S rRNA sequencing is still quantitative, in that relative changes in abundance of taxa between samples can be recovered, despite these biases. Conclusion: Wehaveassessedarangeofexperimentalconditionsacrossseveralnextgenerationsequencing platforms using the most up-to-date configurations. We propose that the choice of sequencing platform and experimental design needs to be taken into consideration in the early stage of a project by running a small trial consisting of several hypervariable regions to quantify the discriminatory power of each region. We also suggest that the use of a synthetic community as a positive control would be beneficial to identify the potential biases and procedural drawbacks that may lead to data misinterpretation. The results of this study will serve as a guideline for making decisions on which experimental condition and sequencing platform to consider to achieve the best microbial profiling

    Delayed presentation of acute ischemic strokes during the COVID-19 crisis

    Get PDF
    This article is made available for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.Background: The COVID-19 pandemic has disrupted established care paths worldwide. Patient awareness of the pandemic and executive limitations imposed on public life have changed the perception of when to seek care for acute conditions in some cases. We sought to study whether there is a delay in presentation for acute ischemic stroke patients in the first month of the pandemic in the US. Methods: The interval between last-known-well (LKW) time and presentation of 710 consecutive patients presenting with acute ischemic strokes to 12 stroke centers across the US were extracted from a prospectively maintained quality database. We analyzed the timing and severity of the presentation in the baseline period from February to March 2019 and compared results with the timeframe of February and March 2020. Results: There were 320 patients in the 2-month baseline period in 2019, there was a marked decrease in patients from February to March of 2020 (227 patients in February, and 163 patients in March). There was no difference in the severity of the presentation between groups and no difference in age between the baseline and the COVID period. The mean interval from LKW to the presentation was significantly longer in the COVID period (603±1035 min) compared with the baseline period (442±435 min, P<0.02). Conclusion: We present data supporting an association between public awareness and limitations imposed on public life during the COVID-19 pandemic in the US and a delay in presentation for acute ischemic stroke patients to a stroke center

    Novel cyclic di-GMP effectors of the YajQ protein family control bacterial virulence

    Get PDF
    Bis-(3 ',5 ') cyclic di-guanylate (cyclic di-GMP) is a key bacterial second messenger that is implicated in the regulation of many critical processes that include motility, biofilm formation and virulence. Cyclic di-GMP influences diverse functions through interaction with a range of effectors. Our knowledge of these effectors and their different regulatory actions is far from complete, however. Here we have used an affinity pull-down assay using cyclic di-GMP-coupled magnetic beads to identify cyclic di-GMP binding proteins in the plant pathogen Xanthomonas campestris pv. campestris (Xcc). This analysis identified XC_3703, a protein of the YajQ family, as a potential cyclic di-GMP receptor. Isothermal titration calorimetry showed that the purified XC_3703 protein bound cyclic di-GMP with a high affinity (K-d similar to 2 mu M). Mutation of XC_3703 led to reduced virulence of Xcc to plants and alteration in biofilm formation. Yeast two-hybrid and far-western analyses showed that XC_3703 was able to interact with XC_2801, a transcription factor of the LysR family. Mutation of XC_2801 and XC_3703 had partially overlapping effects on the transcriptome of Xcc, and both affected virulence. Electromobility shift assays showed that XC_3703 positively affected the binding of XC_2801 to the promoters of target virulence genes, an effect that was reversed by cyclic di-GMP. Genetic and functional analysis of YajQ family members from the human pathogens Pseudomonas aeruginosa and Stenotrophomonas maltophilia showed that they also specifically bound cyclic di-GMP and contributed to virulence in model systems. The findings thus identify a new class of cyclic di-GMP effector that regulates bacterial virulence

    Sex and gender in infection and immunity: addressing the bottlenecks from basic science to public health and clinical applications

    Full text link
    Although sex and gender are recognized as major determinants of health and immunity, their role is rarely considered in clinical practice and public health. We identified six bottlenecks preventing the inclusion of sex and gender considerations from basic science to clinical practice, precision medicine and public health policies. (i) A terminology-related bottleneck, linked to the definitions of sex and gender themselves, and the lack of consensus on how to evaluate gender. (ii) A data-related bottleneck, due to gaps in sex-disaggregated data, data on trans/non-binary people and gender identity. (iii) A translational bottleneck, limited by animal models and the underrepresentation of gender minorities in biomedical studies. (iv) A statistical bottleneck, with inappropriate statistical analyses and results interpretation. (v) An ethical bottleneck posed by the underrepresentation of pregnant people and gender minorities in clinical studies. (vi) A structural bottleneck, as systemic bias and discriminations affect not only academic research but also decision makers. We specify guidelines for researchers, scientific journals, funding agencies and academic institutions to address these bottlenecks. Following such guidelines will support the development of more efficient and equitable care strategies for all

    Species-level functional profiling of metagenomes and metatranscriptomes.

    Get PDF
    Functional profiles of microbial communities are typically generated using comprehensive metagenomic or metatranscriptomic sequence read searches, which are time-consuming, prone to spurious mapping, and often limited to community-level quantification. We developed HUMAnN2, a tiered search strategy that enables fast, accurate, and species-resolved functional profiling of host-associated and environmental communities. HUMAnN2 identifies a community's known species, aligns reads to their pangenomes, performs translated search on unclassified reads, and finally quantifies gene families and pathways. Relative to pure translated search, HUMAnN2 is faster and produces more accurate gene family profiles. We applied HUMAnN2 to study clinal variation in marine metabolism, ecological contribution patterns among human microbiome pathways, variation in species' genomic versus transcriptional contributions, and strain profiling. Further, we introduce 'contributional diversity' to explain patterns of ecological assembly across different microbial community types
    • …
    corecore